Code Relatives: Detecting Similar Software Behavior

نویسندگان

  • Fang-Hsiang Su
  • Kenneth Harvey
  • Simha Sethumadhavan
  • Gail Kaiser
  • Tony Jebara
چکیده

Detecting “similar code” is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term“code relatives”to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor’s 61%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Detection of Similar Code : Techniques and Applications

Similar code, also known as cloned code, commonly exists in large software. Studies show that code duplication can incur higher software maintenance cost and more software defects. Thus, detecting similar code and tracking its migration have many important applications, including program understanding, refactoring, optimization, and bug detection. This dissertation presents novel, general techn...

متن کامل

A Clone Detection Approach for a Collection of Similar Large-Scale Software Products

Reusing existing software with or without modifications is frequently occurred to develop new large software at low cost with high quality. So far, many techniques and tools have been proposed for detecting reused pieces in source code. However, existing tools have low scalability; they spend lots of memory and time to detect reused pieces on large-scale software. In this paper, we proposed an ...

متن کامل

Similar Code Detection and Elimination for Erlang Programs

A well-known bad code smell in refactoring and software maintenance is duplicated code, that is the existence of code clones, which are code fragments that are identical or similar to one another. Unjustified code clones increase code size, make maintenance and comprehension more difficult, and also indicate design problems such as a lack of encapsulation or abstraction. This paper describes an...

متن کامل

Extracting Source Level Program Similarities from Dynamic Behavior

The vast majority of work on comparing program similarities to detect software piracy either assumes the availability of the program source code (e.g., Moss) or performs a complicated source program transformation to embed carefully designed signatures, or software watermarks, into the binary code. In this paper, we propose a new approach to detecting program similarities that requires neither ...

متن کامل

Using Code Instrumentation for Debugging and Constraint Checking

The members of the Committee appointed to examine the thesis of FILARET ILAS find it satisfactory and recommend that it be accepted. ii ACKNOWLEDGMENTS This work would not have been possible without the support and encouragement of my advisor Dr. Orest Pilskalns under whose supervision I chose this topic and began this thesis. I would like to thank Dr. Scott Wallace for his guidance during this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015